Brief Description


The outline of this project are shown below.

1. Content Reconstruction (30 pts)

Hyperparameter Tuning

Image description
Source Image

Ablation on model (vanilla, stylegan)

I selected the StyleGAN model because its adaptive instance normalization (AdaIN) layers and improved network structures enable it to generate images with realistic and diverse features. In comparison to vanilla GAN, StyleGAN exhibits a superior ability to produce high-quality images.

Image description
0_vanilla_z_0.01_l1_1000
Image description
0_stylegan_z_0.01_l1_1000

Ablation on latent (z, w, w+)

I selected the w+ latent space as it represents the intermediate latent space, which provides finer-grained control over the generated images.

Image description
0_stylegan_z_0.01_l1_1000
Image description
0_stylegan_w_0.01_l1_1000
Image description
0_stylegan_w+_0.01_l1_1000

Ablation on loss_type (l1, l2)

I opted for l1 as the loss_type. When compared to the l2 loss, l1 loss treats small and large errors equally. This means that l1 loss incentivizes the network to generate images with sharper edges and more pronounced features.

Image description
0_vanilla_z_0.01_l1_1000
Image description
0_stylegan_z_0.01_l1_1000

Ablation on perc_wgt (0, 0.1, 0.01)

I selected a perc_wgt value of 0.01. A higher perceptual loss weight, such as perc_wgt = 0.1, tends to cause the generator to focus excessively on the reference image, leading to outputs that lack creativity. Conversely, using no perceptual loss (e.g., perc_wgt = 0) results in perceptually unrealistic and unappealing outputs.

Image description
0_stylegan_w+_0_l1_1000
Image description
0_stylegan_w+_0.1_l1_1000
Image description
0_stylegan_w+_0.01_l1_1000

Run time for Ablations

Using a single RTX 3090 GPU, the vanilla GAN took 8.711 seconds to run, while StyleGAN's runtime ranged from 25.298 to 26.602 seconds, depending on the selected hyperparameters.

Visual Results

Now we have done the hyperparameter tuning, let's look at some visual results.

Image description
1_data
Image description
1_stylegan_w+_0.01_l1_1000
Image description
2_data
Image description
2_stylegan_w+_0.01_l1_1000
Image description
3_data
Image description
3_stylegan_w+_0.01_l1_1000
Image description
5_data
Image description
5_stylegan_w+_0.01_l1_1000
Image description
6_data
Image description
6_stylegan_w+_0.01_l1_1000
Image description
8_data
Image description
8_stylegan_w+_0.01_l1_1000

2. Interpolate your Cats (10 pts)

Visual Results

The outcomes of the interpolated gif experiments are presented below. The resulting gifs exhibit exceptional visual appeal, realism, and coherence.

Image description
1_stylegan_w+
Image description
3_stylegan_w+
Image description
5_stylegan_w+
Image description
7_stylegan_w+

3. Scribble to Image (40 pts)

Visual Results

Below, I present some results for the scribble to image task. While the output images resemble cats, they suffer from issues such as distortions, artifacts, and excessive use of blues. This task is challenging due to the difficulty of interpreting incomplete and ambiguous hand-drawn sketches and translating them into coherent images.

Image description
0_data
Image description
0_mask
Image description
0_stylegan_w+_0.01_1000
Image description
1_data
Image description
1_mask
Image description
1_stylegan_w+_0.01_1000
Image description
2_data
Image description
2_mask
Image description
2_stylegan_w+_0.01_1000
Image description
3_data
Image description
3_mask
Image description
3_stylegan_w+_0.01_1000
Image description
4_data
Image description
4_mask
Image description
4_stylegan_w+_0.01_1000

4. EC: Stable Diffusion (10pts)

I utilized stable diffusion to generate a set of images based on text prompts, and the ensuing outcomes are presented below.

Visual Results

"A black cat scribble with a big smile"

Image description
Image description
Image description
Image description

"A brown cat scribble with an very angry looking"

Image description
Image description
Image description
Image description

"A white cat scribble with a big head and a curious looking"

Image description
Image description
Image description
Image description

5. EC: High-res Grumpy Cat (2pts)

Visual Results 128

I conducted image generation experiments on grumpy cat images with a resolution of 128 X 128, and the resulting images are displayed below.

Image description
0_data
Image description
0_stylegan128_w+_0.01_l1_1000
Image description
1_data
Image description
1_stylegan128_w+_0.01_l1_1000
Image description
2_data
Image description
2_stylegan128_w+_0.01_l1_1000
Image description
3_data
Image description
3_stylegan128_w+_0.01_l1_1000
Image description
4_data
Image description
4_stylegan128_w+_0.01_l1_1000
Image description
5_data
Image description
5_stylegan128_w+_0.01_l1_1000

Visual Results 256

Firstly, I conducted image generation experiments on grumpy cat images with a resolution of 256 X 256, and the ensuing results are presented below.

Image description
0_data
Image description
0_stylegan256_w+_0.01_l1_1000
Image description
1_data
Image description
1_stylegan256_w+_0.01_l1_1000
Image description
2_data
Image description
2_stylegan256_w+_0.01_l1_1000
Image description
3_data
Image description
3_stylegan256_w+_0.01_l1_1000
Image description
4_data
Image description
4_stylegan256_w+_0.01_l1_1000
Image description
5_data
Image description
5_stylegan128_w+_0.01_l1_1000

6. EC: Afhqcat Dataset (2pt)

Visual Results

Additionally, I conducted image generation experiments on the Afhqcat dataset, which I present in the following results. It is worth noting that generating high-quality images on the Afhqcat dataset, which has a resolution of 512 X 512, is an challenging task.

Image description
0_data
Image description
0_afhqcat_w+_0.01_l1_1000
Image description
1_data
Image description
1_afhqcat_w+_0.01_l1_1000
Image description
2_data
Image description
2_afhqcat_w+_0.01_l1_1000
Image description
3_data
Image description
3_afhqcat_w+_0.01_l1_1000
Image description
4_data
Image description
4_afhqcat_w+_0.01_l1_1000
Image description
5_data
Image description
5_afhqcat_w+_0.01_l1_1000